\n\n\u2022 \n\n\u00a2 = L...-~~1 ai Y~ (y E IR) and p = L...-~=1 IJ x. \nLj~l 8j x J \n\nLi=l j3iY' \n\n'I\\\" 11\" P \n\n\u2022 \n\nj \n\n(x E IR) . \n\nDirect substitution and rearrangement gives \n\nV'9 = \n\n\u2022 ['I\\\"1I\"p \n\n'1\\\"11\"4> \nL.....i=l a z L...-j=l IJ x \n'I\\\" 11\" 4> \nL...-i=l j3i L...-j~lljxJ \n\n[ 'I\\\" 11\" \n\n. \n\n. \n\n. j]i ['I\\\"1I\"p \n\n'. j]1I\"4>-i \n\nL...-j=l OJ x \n\n. ] t ['I\\\" 11\" \n\nL...-j~l 8j xJ \n\n. ] 11\" 4> - t \n\nwhere we write () = [a, j3, 1,8] and for simplicity set 71\" \u00a2 = 71\" P = 71\". Thus dim () = \n471\" =: v. For arbitrary but fixed x, V' is a rational function of degree k = 71\". No \nbarrier is needed so q = 0 and hence by (3.4), \n\nu.(IF.,.;,) > c, (4\"10g~,,+ IJ\u00b7 \n\n3.2 OPEN PROBLEMS \n\nAn obvious further question is whether results as in the previous section hold for \nmultivariable approximation, perhaps for multivariable rational approximation. \n\nA popular method of d-dimensional nonlinear spline approximation uses dyadic \nsplines [2,5,8]. They are piecewise polynomial representations where the partition \nused is a dyadic decomposition . Given that such a partition 3 is a subset of a \npartition generated by the zero level set of a barrier polynomial of degree ~ 131, can \nVitushkin's results be applied to this situation? Note that in Vitushkin's theory \nit is the parametrization that is piecewise rational (PR), not the representation. \nWhat connections are there in general (if any) between PR representations and PR \nparametrizations? \n\n4 DEGREE OF APPROXIMATION AND LEARNING \n\nDetermining the degree of approximation for given parametrized function classes is \nnot only of curiosity value. It is now well understood that the statistical sample \ncomplexity of learning depends on the size of the approximating class. \nIdeally \nthe approximating class is small whilst well approximating as large as possible \nan approximat ed class. Furthermore, in order to make statements such as in [1] \nregarding the overall degree of approximation achieved by statistical learning, the \nclassical degree of approximation is required. \n\n\f1044 \n\nWilliamson and Bartlett \n\nFor regression purposes the metric used is L p ,/-, , where \n\nwhere J-t is a probability measure. Ideally one would like to avoid calculating the \ndegree of approximation for an endless series of different function spaces. Fortu(cid:173)\nnately, for the case of spline approximation (with free knots) this not necessary \nbecause (thanks to Petrushev and others) there now exist both direct and converse \ntheorems characterizing such approximation classes. Let Sn (f)p denote the error \nof n knot spline approximation in Lp[O, 1]. Let I denote the identity operator and \nT(h) the translation operator (T(h)(f, x) := f(x + h)) and let ~~ := (T(h) - I)k, \nk = 1,2, ... , be the difference operators. The modulus of smoothness of order k for \nf E LpUJ) is \n\nWk(f,t)p := L 11~~f(\u00b7)IILp(n). \n\nPetrushev [9] has obtained \n\nIhl:$;t \n\nLet T = (aid + IIp)-l. Then \nf)n a Sn(f)p]k ~ \n\nn=l \n\n< 00 \n\nn \n\nTheorelll 4.1 \n\n(4.1) \n\nif and only if \n\n(4.2) \n\nThe somewhat strange quantity in (4.2) is the norm of f in a Besov space B<; Tok' \nNote that for a large enough, T < 1. That is, the smoothness is measured in a~ Lp \n(p < 1) space. More generally [11], we have (on domain [0,1]) \n:= ( t (t-awk(f, t)p)q dt) l/q \n\nIlflIBC> 0 \n\np,q,k \n\nJo \n\nt \n\nBesov spaces are generalizations of classical smoothness spaces such as Sobolev \nspaces (see [11]). \n\nWe are interested in approximation in L p ,/-, and following Birman and Solomjak [2] \nask what degree of approximation in Lp ,/-, can be obtained when the knot positions \nare chosen according to J-t rather than f. This is of interest because it makes the \nproblem of determining the parameter values on the basis of observations linear. \n\nTheorelll 4.2 Let f E Lp ,/-, where J-t E LA for some>. > 1 and is absolutely contin(cid:173)\nuous. Choose the n knot positions of a spline approximant v to f on the basis of J-t \nonly. Then for all such f there is a constant c not dependent on n such that \n(4.3) \nwhere u = (a + (1- >.-l)p-l)-l and p < u. The constant c depends on J-t and >.. \n\n\fSplines, Rational Functions and Neural Networks \n\n1045 \n\nII p ~ 1 and (1 S p, for any el' < (1-1 for all I under the conditions above, there is \na v such that \n\n( 4.4) \n\nand again c depends on J-l and A but does not depend on n. \n\nFor any A 2: 1, \n\nProof First we prove (4.3). Let [0,1] be partitioned by 3. Thus if v is the \napproximant to Ion [0,1] we have \n\nIlf - vIIi \u2022. , = ~ Ilf - vlli \u2022. ,(,,) = ~ L If(x) - v(x)IPdl'(x). \n~ [L If - vIP(1-'-')-' dXr'-' [L (ix)' dx r' \n\ni I/(x) - v(x)IPdJ-l(x) = i l l - viP (~~) dx \n\n= III - vll~~(A) IldJ-l/dxIIL>.(A) \n\nwhere 'IjJ = p(l- A-I)-I. Now Petrushev and Popov [10, p.216] have shown that \nthere exists a polynomial of degree k on ~ = [r, s] such that \n\nIII - vll~~(A) s cll/ll~(A) \n\nwhere \n\nIlfIIB(A):= Jo \n\n( \n\nf(s-r)/k \n\n(t-QII~: 1(\u00b7)IIL.,.(r,s-kt)t T \n\ndt) l/u \n\nand (1:= (el' + 'IjJ-l)-I, 0< 'IjJ < 00 and k > 1. Let 131 =: n and choose:=: = Ui~i \n(~i = [ri, SiD such that \n\n1. (~~)' dx = ~lIdl'/dxIlL(o.1)' \n\nThus IldJ-l/dxIIL>.(A) = n-l/AlldJ-l/dxIIL>.(O,l)' Hence \n\n(4.5) \n\nIII - vll~p,,, s ClldJ-l/dxIIL>. L n-I/Allfll~(A)' \n\nAe2 \n\nSince (by hypothesis) p < (1, Holder's inequality gives \n\nII! - vilL., ~ clldJlldxllL, [~ G) t.-:,. ] 1-~ [~llfIlR(\")] ~ \n\nNow for arbitrary partitions 3 of [0,1] Petrushev and Popov [10, page 216] have \nshown \n\nL II/IIB(A) S 1I/IIB~.k \n\nAe3 \n\n' \n\n\f1046 \n\nWilliamson and Bartlett \n\nwhere Be;.k = Be; l7\u00b7k = B([O, 1]). Hence \n\n, , \n\n, \n\nIII - vll~ \n\np,p. \n\n:S clldJ.tjdxIIL>. n~+I-t \"I\"~cw \n\ntT j k \n\nand so \n\n(4 .6) \nwith u = (a + '1/'-1 )-1, 'I/' = p(l- A-I )-1 . Hence u = (a + 1_;-1 )-1. Thus given a \nand p, choosing different A adjusts the u used to measure I on the right-hand side \nof (4 .6). This proves (4.3). \nNote that because of the restriction that p < u , a > 1 is only achievable for p < 1 \n(which is rarely used in statistical regression [6]). Note also the effect of the term \nIIdJ.tjdxll.i!:' When A = 1 this is identically 1 (since J.t is a probability measure). \nWhen A > 1 it measures the departure from uniform distribution, suggesting the \ndegree of approximation achievable under non-uniform distributions is worse than \nunder uniform distributions. \nEquation (4.4) is proved similarly. When u :S p with p 2: 1, for any a :S 1 j u, we \ncan set A := (1 - ~ + pa )-1 2: 1. From (4.5) we have \n\nIII - vll~ \u2022.\u2022 :0; clld,,/dxllL, ~ (D 1/' 11/11':.(\"1 \n\n:0; clld,,/dxIIL, G) 1/' [~II/IIB(\"f\" \n\n:S clldJ.tjdxIlL>.n-l+~-pa ll/ll~<> \n\n0' ;k \n\nand therefore \n\n\u2022 \n\n5 CONCLUSIONS AND FURTHER WORK \n\nIn this paper a result of Vitushkin has been applied to \"multi-layer\" rational ap(cid:173)\nproximation . Furthermore, the degree of approximation achievable by spline ap(cid:173)\nproximation with free knots when the knots are chosen according to a probability \ndistribution has been examined. \n\nThe degree of approximation of neural networks, particularly multiple layer net(cid:173)\nworks, is an interesting open problem. Ideally one would like both direct and con(cid:173)\nverse theorems, completely characterizing the degree of approximation. If it turns \nout that from an approximation point of view neural networks are no better than \ndyadic splines (say), then there is a strong incentive to study the PAC-like learning \ntheory (of the style of [7]) for such spline representations. We are currently working \non this topic . \n\n\fSplines, Rational Functions and Neural Networks \n\n1047 \n\nAcknowledgements \n\nThis work was supported in part by the Australian Telecommunications and Elec(cid:173)\ntronics Research Board and OTC. The first author thanks Federico Girosi for pro(cid:173)\nviding him with a copy of [4]. The second author was supported by an Australian \nPostgraduate Research Award. \n\nReferences \n\n[1] A. R. Barron, Approxima.tion and Estimation Bounds for Artificial Neural Networks, \n\nTo appear in Machine Learning, 1992. \n\n[2] M. S. Birman and M. Z. Solomjak, Piecewise-Polynomial Approximations of Func(cid:173)\ntions of the Classes W;, Mathematics o{the USSR - Sbornik, 2 (1967), pp. 295-\n317. \n\n[3] L. Chua and A. -C'. Deng, Canonical Piecewise-Linear Representation, IEEE Trans(cid:173)\n\nactions on C'iIcuits and Systems, 35 (1988), pp. 101-111. \n\n[4] R. A. DeVore. Degree of Nonlinear Approximat.ion, in Approximation Theory VI, \nVolump 1. C. K. Chui. L. 1. Schumaker and J. D. Ward, eds., Academic Press, \nBost.on, 1991, pp. 17.5-201. \n\n[5] R. A. DeVore, B. Jawert.h and V. Popov, Compression of Wavelet Decompositions, \n\nTo appear in American Journal of Mathematics, 1992. \n\n[6] H. Ekblom , Lp-met.hods for Robust Regression, BIT, 14 (1974), pp. 22-32. \n\n[7] D. Haussler, Decision Theoretic Generalizations of the PAC Model for Neural Net \nand Ot.her Learning Applicat.ions, Report UCSC-CRL-90-52, Baskin Center for \nComputer Engineering and Informat.ion Sciences, University of California, Santa \nCruz, H)90. \n\n[8] P. Oswald. On t.he Degree of Nonlinear Spline Approximat.ion in Besov-Sobolev \n\nSpace~. Journal of A.pproximatioll Theory, 61 (1990), pp. 131-157. \n\n[9] P. P. Pet.ru~hev. Direct and Converse Theorems for Spline and Rational Approxi(cid:173)\n\nmation and Be~oy Spaces, in Function Spaces and Applications (Lecture Notes \nill Ma tllem a. tics 1.'3(2), M. Cwikel, J. Peetre, Y. Sagher and H. Wallin, eds., \nSprin~er-Verlag. Berlin, 1988, pp. 363-377. \n\n[10] P. P. Petrushev and V. A. Popov, Rational Approximation of Real Functions, Cam(cid:173)\n\nbridge Univer~it.y Press, Cambridge, 1987. \n\n[11] H. Triebel. TlleorJ' of Function Spaces . Birkhauser Verlag, Basel, 1983. \n\n[12] A. G. Vitllshkin. Tlleocy of the Transmission and Processing of Information, Perg(cid:173)\n\namon Press, Oxford, 1961, Originally published as Otsenka slozhnosti zadachi \ntaIJu1icO\\'ani,va (Est.ima.tion of the Complexit.y of the Tabulation Problem), Fiz(cid:173)\nmatgiz. Mo~cow. 19.59. \n\n[13] R. C. Williamson and U. Helmke, Existence and Uniqueness Results for Neural \n\nNetwork Approximations. Submitted, 1992. \n\n\f", "award": [], "sourceid": 442, "authors": [{"given_name": "Robert", "family_name": "Williamson", "institution": null}, {"given_name": "Peter", "family_name": "Bartlett", "institution": null}]}