User:Melissa 4.0/Extension:FileFetcher
File Fetcher Release status: experimental |
|
---|---|
Implementation | Parser function, Tag |
Description | Fetches files from outside the wiki for inclusion in pages using standard image syntax functionality |
Author(s) | Douglas Williams (Melissa 4.0talk) |
Latest version | 0.2.1 (2009-04-12) |
MediaWiki | 1.14.0 |
License | GPL |
Download | No link |
Translate the Melissa 4.0/Extension:FileFetcher extension if it is available at translatewiki.net |
Description
editWhilst MediaWiki has solid support for the display of uploaded media, it does not always extend this functionality to external files (which have not been uploaded to the wiki). This can be an inconvenience for users who wish to maintain their media files separately from their wiki (for example, within a hierarchical directory structure on their hard disk), but would nonetheless like to take advantage of MediaWiki's image syntax functionality for formatting (border
, frame
, thumb
, frameless
), resizing, and captioning images.
This extension works by fetching any specified file from its path or URI, storing it into a repository, and making it accessible through the standard image syntax. Before using this extension, make sure to check whether any of the simpler alternatives would suffice for your needs.
Warning
editThis extension is only intended for use in personal or corporate wikis deployed in a high-trust environment. The aim of this extension is to facilitate the retrieval of files from external sources, including the local filesystem. If used in an inappropriate setting, this extension would pose a severe security breach since all users with write access to the wiki would be capable of accessing any file on the local filesystem (including those exposed through network shares) and injecting malicious content.
Usage
editThe {{fetchfile:}}
parser function fetches the file from the specified path or URI, and returns its file name as stored in the repository. Thus, it could easily be used to combine a web address with image syntax; for example:
[[File:{{fetchfile:http://upload.wikimedia.org/wikipedia/mediawiki/a/a9/Example.jpg}}|thumb|left|200px|This is a sample image.]]
The same syntax may be used with local paths:
[[File:{{fetchfile:C:\Downloaded Images\Example.jpg}}|thumb|left|200px|This is an sample image.]]
Either of the above would give a result similar to:
The same syntax may also be used for image galleries, using the <fetchgallery>
parser extension tag:
<fetchgallery widths="200" heights="150"> File:{{fetchfile:http://upload.wikimedia.org/wikipedia/mediawiki/a/a9/Example.jpg}}|Image fetched from web address. File:{{fetchfile:C:\Downloaded Images\Example.jpg}}|Image fetched from local path. </fetchgallery>
-
Image fetched from web address.
-
Image fetched from local path.
Download instructions
editPlease download the archive containing the PHP source files from the link given in the infobox above, and extract it into the $IP/extensions
directory. (Note: $IP
stands for the root directory of your MediaWiki installation, the same directory that holds LocalSettings.php
.)
Installation
editTo install this extension, add the following to LocalSettings.php
:
require_once("$IP/extensions/FileFetcher/FileFetcher.php");
# Add configuration settings here.
Documentation
editMost of the extension is implemented through the following classes (with indentation indicating subclassing):
FileFetcher
: The main file fetcher class, instantiated using the singleton pattern. Responsible for fetching a file from its source using the appropriateSourceHandler
subclass instance. In order to support the fetching of files from diverse sources using specific protocols (wrappers), all the functionality required for handling source files has been delegated to theSourceHandler
class (as implemented in its protocol-specific subclasses). Similarly, in order to support the storing of files in diverse repositories, all the functionality required for handling target files has been delegated to theFileRepoHandler
class (as implemented in its repository-specific subclasses).
SourceHandler
: Base class defining methods and members for handling the fetching of files from sources using a specific protocol (wrapper).FileSourceHandler
: Base class defining methods and members for handling the fetching of files from file-based sources using the PHP in-built file functions.FSSourceHandler
: Source handler responsible for the fetching of files from the local filesystem.HttpSourceHandler
: Source handler responsible for the fetching of files over the HTTP or HTTPS protocol.
FileRepoHandler
: Base class defining methods and members for handling the storing of files into the associated file repository.ForeignRepoHandler
: Base class defining methods and members for handling the storing of files into a foreign file repository registered with the MediaWiki core through the$wgForeignFileRepos
array.FSRepoHandler
: File repository handler for storing files through aFSRepo
(filesystem foreign repository) instance initialized by the MediaWiki core.
Both source handlers and file repository handlers may be configured by modifying the values of their public class members. Source handlers are instantiated using the singleton pattern; the singleton instance of each concrete source handler class (namely, FSSourceHandler
and HttpSourceHandler
) may be accessed through its getInstance()
static method. Each source handler has an associated file repository handler, which may be accessed through the former's $mFileRepoHandler
member.
Source handlers
editThe main configuration members of the source handlers are the following:
SourceHandler::$mFileRepoHandler
: The file repository handler associated with this source handler. It is allowed for multiple source handler instances to share the same file repository handler instance; in such cases, the settings for the latter would also be shared.
SourceHandler::$mFileSizeMax
: The maximum size of a fetched file, in bytes. Files exceeding this size will not be fetched, and will throw an exception. Set to a negative value to refrain from enforcing a maximum size (not recommended).
SourceHandler::$mTolerateUnknownFileSize
: Whether to tolerate unknown file sizes. A file size may be unknown either because the source does not support reporting it, or because an error was encountered whilst the source handler was determining it. If set totrue
, the file should be fetched anyway if the file size is unknown. If set tofalse
, the file should not be fetched if the file size is unknown.
FSSourceHandler::$mHandleNonUris
: Whether to handle any path which is not recognized as a URI. Set totrue
to handle non-URI file path representations, such as "C:\Image.jpg
" on Windows and "/home/name/Image.jpg
" on Linux. Set tofalse
to refrain from handling any path which is not recognized as a URI. Valid URLs belonging to the file URI scheme (starting with "file:
") are always handled in either case.
For example:
# Do not allow the filesystem source handler to handle non-URI paths.
# All filesystem paths would need to be specified using the file URI scheme.
FSSourceHandler::getInstance()->mHandleNonUrls = false;
# Set the maximum size of files to be fetched over HTTP to 200 KiB.
HttpSourceHandler::getInstance()->mFileSizeMax = 200 * pow( 1024, 1 );
File repository handlers
editThe target file name for a file to be saved in the associated repository of a file repository handler is constructed from the following components:
- the source file name (excluding extension)
- a space character, '
- the generated hash
- a dot character, '
.
', if the source file extension is non-empty - the source file extension
The FileRepoHandler
class defines three length preservation members affecting the construction of target file names:
FileRepoHandler::$mPreservedFileNameLength
: The maximum length of the source file name (excluding extension) to be preserved when constructing the target file name.FileRepoHandler::$mPreservedFileExtensionLength
: The maximum length of the source file extension to be preserved when constructing the target file name.FileRepoHandler::$mPreservedHashLength
: The maximum length of the generated hash to be preserved when constructing the target file name. If set to0
, the hash is not generated at all.
These members determine how their associated components are used in the construction of the target file name, in the following manner:
- Set to a positive value n to preserve the first n characters from the component.
- Set to
0
to preserve nothing of the component. - Set to a negative value to preserve the full component (not recommended for arbitrary-length components).
If the constructed target file name is empty, an exception would be thrown. This would happen if, for example, all the component length preservation members are set to 0
.
The other configuration members of the file repository handlers are the following:
FileRepoHandler::$mUseMemoryString
: Whether to read the contents of the source file into a string in memory before storing it into the file repository. If set totrue
, the file would be read into a string in memory before being saved to the file repository. If set tofalse
, the file would be copied directly to the file repository.
FileRepoHandler::$mHashContents
: Whether to generate the hash from the actual contents of the source file, rather than from its path or URI. Set totrue
to generate the hash from the actual contents of the source file. Set tofalse
to generate the hash from the file path or URI of the source file. This setting has no effect if$mPreservedHashLength
is set to0
.
FileRepoHandler::$mParenthesizeHash
: Whether to introduce parenthetical marks, "(
...)
", around the generated hash (if non-empty) when constructing the target file name.
ForeignRepoHandler::$mForeignRepoInfo
: The foreign repository structure (array) from which theFileRepo
instance associated with this handler is initialized by the MediaWiki core. One may change its element values or replace it altogether; however, it should never be set tonull
after being registered with the MediaWiki core$wgForeignFileRepos
array.
Provided that $mPreservedHashLength
is not set to 0
, hash generation works as follows:
- If
$mHashContents
is set tofalse
, then the hash is generated from the source file path or URI (i.e. from the string containing the path or URI itself), without paying any consideration to the file contents. - If
$mHashContents
is set totrue
, then the hash is generated from the file contents, in the following manner:- If
$mUseMemoryString
is set totrue
, then the hash is generated from the file contents as formerly read to a string in memory. - If
$mUseMemoryString
is set tofalse
, then the hash is generated from the file contents as read during the hash operation itself.
- If
For example:
# Set the file repository handler associated with the HTTP source handler to generate the hash for the
# target file name from the source URI itself rather than from the source file contents.
# This way, the file would not have to be retrieved at all if it already exists in the repository.
HttpSourceHandler::getInstance()->mFileRepoHandler->mHashContents = false;
# Set the file repository handler associated with the filesystem source handler to read the contents of
# the source file into a string in memory before storing it in the repository.
# This way, should its $mHashContents be set to true, the file would
# not need to be read twice (once for the hash and once for the direct copy),
# since the hash would be performed on the string in memory.
FSSourceHandler::getInstance()->mFileRepoHandler->mUseMemoryString = true;
# Set the directory into which the file repository handlers associated with both source handlers
# store the fetched files.
FSSourceHandler::getInstance()->mFileRepoHandler->mForeignRepoInfo[ 'directory' ] = "$IP/images/fsfetched";
HttpSourceHandler::getInstance()->mFileRepoHandler->mForeignRepoInfo[ 'directory' ] = "$IP/images/httpfetched";
# Set the base URLs through which the files will be accessed by clients to correspond
# with the above directory. Always ensure that the paths correspond (unless you
# have set up aliases through your web server).
FSSourceHandler::getInstance()->mFileRepoHandler->mForeignRepoInfo[ 'url' ] = "$wgScriptPath/images/fsfetched";
HttpSourceHandler::getInstance()->mFileRepoHandler->mForeignRepoInfo[ 'url' ] = "$wgScriptPath/images/httpfetched";
Simpler alternatives
editIn one simply wishes to allow inline images hosted externally (hotlinking) without requiring any image syntax functionality, simply set $wgAllowExternalImages
to true
. For images from the local filesystem, one could use the file URI scheme after enabling it through $wgUrlProtocols
. (Note that some browsers, including Mozilla Firefox, will not follow file URLs on pages that have been loaded via HTTP as a security measure. For details and workarounds, see the MozillaZine article "Links to local pages do not work".)
# Allow external images to be rendered inline with text.
$wgAllowExternalImages = true;
# Enable links to files on the local filesystem through the file: protocol.
$wgUrlProtocols[] = "file://";
MediaWiki does offer some support for manipulating external media with standard image syntax functionality if one registers the foreign repository with the $wgForeignFileRepos
array. For the local filesystem, the relevant core repository class would be the FSRepo
.
# Define the filesystem foreign repository structure and register it with MediaWiki.
$wgForeignFileRepos[] = array
(
# The class name for the repository.
# The core repository classes are LocalRepo, ForeignDBRepo, FSRepo, and ForeignAPIRepo.
'class' => "FSRepo",
# A unique name for the repository.
'name' => "MyRepo",
# The root directory in which the files are located.
'directory' => "$IP/myfiles",
# The number of directory levels for hash-based division of files.
'hashLevels' => 0,
# The base public URL.
'url' => "$wgScriptPath/myfiles",
);
FSRepo
has two limitations (which we aimed to address through our extension):
- All files must reside in a single directory. Subdirectories are only permitted for dividing files according to the hash level.
- All files must be accessible through a URL.
Finally, if one's aim is simply to automate the upload of large quantities of files, then this extension should not be used. The file repository handler implemented in this extension does not register any of the fetched files with the MediaWiki local repository (LocalRepo
), losing out on the benefits that such integration would offer. This was a design decision, since we wanted a solution which kept the files as separate from MediaWiki – in fact, the file fetching process does not make any changes to the wiki's database, and the directory in which the fetched files are stored may be safely deleted at any time without breaking the system. (In the case of deletion, the files would be fetched again the next time one of the pages in which they appear is visited.)
Bugs and limitations
edit- Unicode is not supported yet.